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Abstract 

The majority of First Order methods for large-scale convex-concave saddle point problems and 
variational inequalities with monotone operators are proximal algorithms which at every iteration 
need to minimize over problem’s domain X the sum of a linear form and a strongly convex func¬ 
tion. To make such an algorithm practical, X should be proximal-friendly - admit a strongly 
convex function with easy to minimize linear perturbations. As a byproduct, X admits a com¬ 
putationally cheap Linear Minimization Oracle (LMO) capable to minimize over X linear forms. 
There are, however, important situations where a cheap LMO indeed is available, but X is not 
proximal-friendly, which motivates search for algorithms based solely on LMO. For smooth convex 
minimization, there exists a classical LMO-based algorithm - Conditional Gradient. In contrast, 
known to us LMO-based techniques for other problems with convex structure (nonsmooth convex 
minimization, convex-concave saddle point problems, even as simple as bilinear ones, and varia¬ 
tional inequalities with monotone operators, even as simple as affine) are quite recent and utilize 
common approach based on Fenchel-type representations of the associated objectives/vector fields. 
The goal of this paper is to develop an alternative (and seemingly much simpler) decomposition 
LMO-based techniques for bilinear saddle point problems and for variational inequalities with affine 
monotone operators. 


1 Introduction 

This paper is a follow-up to our paper [17] and, same as its predecessor, is motivated by the desire to 
develop first order algorithms for solving convex-concave saddle point problem (or variational inequal¬ 
ity with monotone operator) on a convex domain X represented by Linear Minimization Oracle (LMO) 
capable to minimize over A, at a reasonably low cost, any linear function. “LMO-representability” of 
a convex domain X is an essentially weaker assumption than “proximal friendliness” of X (possibility 
to minimize over X, at a reasonably low cost, any linear perturbation of a properly selected strongly 
convex function) underlying the vast majority of known first order algorithms. There are important 
applications giving rise to LMO-represented domains which are not proximal friendly, most notably 
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• nuclear norm balls arising in low rank matrix recovery and in Semidefinite optimization; here 
LMO reduces to approximating the leading pair of singular vectors of a matrix, while all known 
proximal algorithms require much costly computationally full singular value decomposition, 

• total variation balls arising in image reconstruction; here LMO reduces to solving a specific flow 
problem [14], while a proximal algorithm needs to solve a much more computationally demanding 
linearly constrained convex quadratic program, 

• some combinatorial polytopes. 

The needs of these applications inspire the current burst of activity in developing LMO-based opti¬ 
mization techniques. In its major part, this activity was focused on smooth (or Lasso-type smooth 
regularized) Convex Minimization over LMO-represented domains, where the classical Conditional 
Gradient algorithm of Frank & Wolfe [7] and its modifications are applicable (see, e.g., [5, 6, 8, 9, 
13, 14, 15, 16, 21] and references therein). LMO-based techniques for large-scale Nonsmooth Convex 
Minimization (NCM), convex-concave Saddle Point problems (SP), even bilinear ones, and Variational 
Inequalities (VI) with monotone operators, even affine ones, where no classical optimization methods 
work, have been developed only recently. To the best of our knowledge, the related results reduce 
to LMO-based techniques for large-scale NCM based on Nesterov’s smoothing [1, 20, 23, 24, 4, 18]. 
An alternative approach to NCM, based on Fenchel-type representations of convex functions and pro¬ 
cessing the induced by these representations problems dual to the problem of interest, was developed 
in [4] and was further extended in [17] to convex-concave SP’s and VPs with monotone operators. 
The goal of this paper is to develop an alternative to [17] decomposition-based approach to solving 
convex-concave SP’s and monotone VPs on LMO-represented domains. In the nutshell, this approach 
is extremely simple, and it makes sense to present an informal outline of it, in the SP case, right here. 

Given convex compact sets Xi, X 2 ,Yi,Y 2 in Euclidean spaces, consider a convex-concave 
saddle point “master” problem 

min max <I>(xi, X 2 ; yi, 

[a;i;X2]GXiXX 2 [yi;j/2]GVx^2 

along with two “induced” problems 

(P) mina^^gxi maxj/^gy^ := min 2 , 26 X 2 maxj/^gy^ '^(xi,X2;yi,y2)] 

(D) mm^2GX2 maXj^2Gy2 l'<P(x2,y2) '■= minaj^gXi maxy^^n ^(xi,X2; ^ 1 ,^ 2 )] 

It is easily seen that (P) and (P) are convex-concave problems and a good approximate 
solution to the master problem induces straightforwardly equally good approximate so¬ 
lutions to (P) and to (P). More importantly, it turns out that when solving one of the 
induced problems, say, (P), by an “intelligent,” in certain precise sense, algorithm, infor¬ 
mation acquired in course of building an e-solution yields straightforwardly an e-solution 
to the master problem, and thus yields an e-solution to the other induced problem, in our 
case, to (P). 

Now imagine that we want to solve a convex-concave SP problem which “as is” is too 
complicated for the standard solution techniques (e.g., problem’s domain is not proximal- 
friendly, or is of huge dimension). Our proposed course of actions is to make the problem 
of interest the problem (P) stemming from a master problem built in a way which ensures 
that the associated problem (P) is amenable to an “intelligent” solution algorithm 13. Af¬ 
ter such a master problem is built, we solve (P) within a desired accuracy e by P and use 
the acquired information to build an e-solution to the problem of interest. 
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As we shall see, our decomposition approach can, in principle, handle general convex-concave SP’s and 
affine Vi’s. Our emphasis in this paper is, however, on bilinear SP’s and on Vi’s with affine operators 
- the cases which, on one hand, are of primary importance in numerous applications, and, on the 
other hand, are the cases where our approach is easy to implement and where this approach seems to 
be more flexible and much simpler than the machinery of Fenchel-type representations developed in 
[17] (and in fact even covers this machinery, see section 3.3). 

The rest of this paper is organized as follows. In sections 2 and 3 we present our decomposition- 
based approach to SP problems, resp., Vi’s with monotone operators, with emphasis on utilizing 
the approach to handle bilinear SP’s, resp., affine Vi’s, on LMO-represented domains. We illustrate 
our constructions by applying them to Colonel Blotto type matrix game (section 2.6.3) and Nash 
Equilibrium with pairwise interactions (section 3.2.2); in both these illustrations, decomposition allows 
to overcome difficulties coming from potentially huge ambient dimensions of the problems. 

Proofs missing in the main body of the paper are relegated to Appendix. 

2 Decomposition of Convex-Concave Saddle Point Problems 

2.1 Situation 

In this section, we focus on the situation as follows. Given are 

1. convex compact sets Xi in Euclidean spaces A) and convex compact sets V in Euclidean spaces 

y^,i = 1,2; 

2. convex compact sets X, Y such that 

A C Ai X As C A := Ai X Aa, V C Vi X Fa C V := Vi X V2, 

such that the projections of A onto Aj are the sets Aj, and projections of Y onto W are the 

sets Yi, i = 1,2. For xi G Ai, we set Aa[xi] = {xa : [xi;xa] G A} C Aa, and for yi G Yi we set 

Fa[yi] = {ya : [yi; ya] e F} C Fa- Similarly, 

Ai[xa] = {xi : [xi;xa] G A}, xa G Aa, and Fi[ya] = {ya : [y^ya] G F}, ya G Fa; 

3. Lipschitz continuous function 

l>(x = [xi;xa];y = [yi;ya]) : A X FR (1) 

which is convex in x G A, and concave in y G F. 

We call the outlined situation a direct product one, when A = Ai x Aa and F = Fi x Fa. 

2.2 Induced convex-concave functions 

We associate with $ primal and dual induced functions: 

(p{xi,yi) := min max ^>(xi, xa; yi, ya) = max min ^>(xi, xa; yi, ya) : Ai x Fi-)■ R, 

X2€X2[xi] y2^y2[yi] y2^y2[yi] X2€X2[xi] 

'4’{x2,y2) ■= min max <h(xi, xa; yi, ya) = max min <I>(xi, xa; yi, ya) : Aa x Fa —)■ R. 

x\eXx\x2] yi6Yi[)/2] y\eyi[y2] x\eXx\x2] 

(the equalities are due to the convexity-concavity and continuity of and convexity and compactness 
of Ai[-] and Fi[-]). 
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Recall that a Lipschitz continuous convex-concave function 6{u, u) : ?7 x R —)• R with convex 
compact U, V gives rise to the primal and dual problems 


Opt{P[0,U,V]) 

Opt{D[e,u,v]) 


with equal optimal values: 


min 

ueu 


e{u) 


max 

v^V 



max9(u, v) 
v&V 

min0(rt, v) 

uGU 


SadVal(0, U, V) := Opt(P[0, U, V)] = Opt(L>[0, U, V]), 
same as gives rise to saddle point residual 

e..d([n; v]\9, U, R) = 9{u) - 9{v) = [9{u) - Opt{P[9, U, R])] + [Opt{D[9, U, R]) - 0(u)] . 

Lemma 1. (j) and ip are convex-concave on their domains, are lower (upper) semicontinuous in their 
“convex” (“concave”) arguments, and are Lipschitz continuous in the direct product case. Besides 
this, it holds 

SadVal((^, Xi,Yi) = SadVal($, X, Y) = SadVal(V’, ^ 2 ,^ 2 ), (2) 

and whenever x = [xi]X 2 \ G X and y = [yi; ^ 2 ] £ Y, one has 

e,s.d{[xi-,yi]\(t>,Xi,Yi) < e,^i{[P,y]\^,X,Y), e,,d([^2;y2]|'0,-^ 2 ,> 2 ) < e,,d([S;y]|^,-’f,(3) 


The strategy for solving SP problems we intend to develop is as follows; 

1. We represent the SP problem of interest as the dual SP problem 

min max'tp{x 2 ,y 2 ) (D) 

X2€:X2 

induced by master SP problem 

min max ^{xi,X 2 ;yi,y 2 ) (M) 

[xi;X2]eX [yi-,y2]&Y 

The master SP problem is built in such a way that the associated primal SP problem 

min max (/)(xi, yi) (P) 

admits First Order oracle and can be solved by a traditional First Order method (e.g., a proximal 
one). 

2. We solve (P) to a desired accuracy by First Order algorithm producing accuracy certificates [19] 
and use these certificates to recover approximate solution of required accuracy to the problem 
of interest. 

We shall see that the outlined strategy (originating from [3]^) can be easily implemented when the 
problem of interest is a bilinear SP on the direct product of two LMO-represented domains. 

^in hindsight, a special case of this strategy was used in [10, 11]. 
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2.3 Regular sub- and supergradients 

Implementing the outlined strategy requires some “agreement” between the first order information of 
the master and the induced SP’s, and this is the issue we address now. 

Given xi £ ^i,yi £ Yi, let X2 £ X2[xi] and y2 £ 1 ^ 2 [yi] form a saddle point of the function 
$(xi, 2:2; yi, 1/2) (min in X2 £ X 2 [xi], max in y 2 £ l2[yi])- In this situation we say that {x = [xi; X2], y = 
[yi; ^ 2 ]) belongs to the saddle point frontier of and we denote this frontier by S. Let now z = {x = 
[xi;x2]jy = [yi;y2]) £ <s, so that the function ‘h(xi,X2;yi,y2) attains its minimum over X2 £ -^2[Si] 
at X2, and the function ‘h(xi,X2;yi,y2) attains its maximum over y2 £ L2[yi] at y2- Consider a 
subgradient G of $(•; yi, y2) taken at x along X\ G ^ dx^{x]y). We say that G is a regular subgradient 
of <I> at 2, if for some y £ Si it holds 

Vx = [xi; X2] £ X : (G, x — x) > (y, xi — xi); 

every y satisfying this relation is called compatible with G. Similarly, we say that a supergradient H 
of ‘h(x; •), taken at y along 1 " is a regular supergradient of at z, if for some /i £ Fi it holds 

Vy = [yi; y2] eY : {H,y - y) < {h, yi - yi), 
and every h satisfying this relation will be called compatible with H. 

Remark 1. Let X = Xi x X 2 , y = li x y 2 , meaning that we are in the direct product case. If^{x‘, y) 
is differentiable in x at x = x, the partial gradient Xx^{x',y) is a regular subgradient 0/$ at (x,y), 
and Xxi^{x;y) is compatible with this subgradient: 

Vx = [xi;x2] £ Xi X X2 : 

(Va;^>(x; y), X - x) = (V 3 ;i^>(x; y), xi - xi) + (Va;2^(x; y), X2 - X2) > (V 3 ;i^>(x; y), xi - xi). 

'-V-' 

>0 

Similarly, if <I>(x;y) is differentiable in y at y = y, then the partial gradient Vy<I>(x;y) is a regular 
supergradient o/$ at (x,y), and Xy^^{x]y) is compatible with this supergradient. 

Lemma 2. In the situation of section 2.1, let z = {x = [xi;x 2 ],y = [yi;y 2 ]) G S, let G be a regular 
subgradient of ^ at z and let g be compatible with G. Let also H be a regular supergradient of ^ at z, 
and h be compatible with H. Then g is a subgradient in xi, taken at (xi, yi) along Xi, of the induced 
function cj), and h is a supergradient in yi, taken at (xi,yi) along Yi, of the induced function 4>: 

(a) (?i(xi,yi) > (/)(xi;yi) + (y,xi - xi), 

(b) (?i(xi,yi) < (/)(xi;yi) + (/i,yi-yi). 

for all xi £ Xi, yi £ yi- 


Regular sub- and supergradient fields of induced functions. In the sequel, we say that 
(fxi{xi,yi), (j)y.^{xi,yi) are regular sub- and supergradient fields of 4 >, if for every (xi,yi) £ Xi x W 
and properly selected X2, y2 such that the point z = (x = [xi;X2],y = [yi;y2]) is on the SP frontier 
of $, (xi, yi), 4 >y_^{xi,yi) are the sub- and super gradients of cj) induced, via Lemma 2 , by regular 
sub- and supergradients of <I> at z. Invoking Remark 1 , we arrive at the following observation: 

Remark 2. Let X = Xi x X 2 , y = yi x y2, meaning that we are in the direct product case. If $ 
is differentiable in x and in y, then regular sub- and supergradient fields of 4> can he built as follows: 
given (xi, yi) £ Xi x yi, we find X2, y2 such that the point z = (x = [xi; X2], y = [yi; 2/2]) is on the SP 
frontier of <h, and set 

= Vxi$(xi,X2;yi,y2), 4>y^{xi,yi) = Xy.,^{xi,x2;yi,y2)- (4) 
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2.3.1 Existence of regular sub- and supergradients 

The notion of regular subgradient deals with $ as a function of [xi;x 2 ] G X only, the y-argument 
being fixed, so that the existence/description questions related to regular subgradient deal in fact with 
a Lipschitz continuous convex function on X. And of course the questions about existence/description 
of regular supergradients reduce straightforwardly to existence/decription of regular subgradients (by 
swapping the roles of x’s and y’s and passing from <!' to —$). Thus, as far as existence and description 
of regular sub- and super gradients is concerned, it suffices to consider the situation where 

• T(xi,X 2 ) is a Lipschitz continuous convex function on X, 

• xi G Xi, and X2 G X 2 [xi] is a minimizer of 'L(xi,X 2 ) over X2 G X2[xi]. 

What we need to understand, is when a subgradient G of T taken at x = [xi;x 2 ] along X and some 
g satisfy the relation 

{G, [xi;x2] -x) > {g, xi - xi) Vx = [xi; X 2 ] € X (5) 

and what can be said about the corresponding y’s. The answer is as follows: 

Lemma 3. With T, xi, X 2 as above, G G (9T(x) satisfies (5) if and only if the following two properties 
hold: 

(i) G is a “certifying” subgradient 0 /T at x, meaning that (G, [0; X 2 — X 2 ]) > 0 Vx 2 G A 2 [xi] (the 
latter relation indeed certifies that X2 is a minimizer o/T(xi,X 2 ) over X2 G X2[xi\); 

(a) g is a subgradient, taken at xi along Xi, of the convex function 

Xg{xi) = mm (G, [xi;x 2 ]) 

It is easily seen that with T, x = [xi; X 2 ] as in Lemma 3 (i.e., T is convex and Lipschitz continuous 
on X, xi G Xi, and X 2 G Ai 2 [xi] minimizes T(xi,X 2 ) over X 2 G X 2 [xi]) a certifying subgradient G 
always exists; when T is differentiable at x, one can take G = Va;'I'(x). The function xd'), however, 
not necessary admits a subgradient at xi; when it does admit it, every g G dxcixi), satisfies (5). In 
particular, 

1. [Direct Product case] When X = Xi x X 2 , representing a certifying subgradient G of T, taken 
at [xi;x 2 G kvgm.m^^^x2 2 :^ 2 )], as [g-,h], we have 

{h, X 2 - X 2 ) > 0 Vx2 G X2, 

whence Xg{xi) = {g-ixf) + {h,X 2 ), and thus y is a subgradient of XG at xi. In particular, in the 
direct product case and when T is differentiable at x, (5) is met by G = V'I'(x), y = Va;i'I'(x); 

2. [Polyhedral case] When A is a polyhedral set, for every certifying subgradient G of T the function 
Xg is polyhedrally representable with domain Xi and as such has a subgradient at every point 
from Xi] 

3. [Interior case] When xi is a point from the relative interior of Xi, XG definitely has a subgradient 
at xi- 

2.4 Main Result, Saddle Point case 

2.4.1 Preliminaries: execution protocols, accuracy certificates, residuals 

We start with outlining some simple concepts originating from [19]. Let VP be a convex compact set 
in Euclidean space W, and M{w) : VP —)• W be a vector field on VP. A t-step execution protocol 
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associated with M, W is a collection It = {wi G W,M{wi) : 1 < i <t}. A t-step accuracy certificate is 
a t-dimensional probabilistic (i.e., with nonnegative entries summing up to 1) vector A. Augmenting 
a t-step accuracy protocol It, by t-step accuracy certificate A gives rise to two entities: 

approximate solution: = w^{It, A) := Yl\=i ^i'^i G ^5 

residual: Res(Xt, At|hh) = max^-^, Xi{M{wi),Wi — w). 

wGW 

When W = U x V, where U is a closed convex subset of Euclidean space U and E is a closed 
convex subset of Euclidean space V, and M is vector field induced by convex-concave function 9{u, v) : 
[/ X E —)• R, that is, 


M{u,v) = [Mu{u,v); M^{u,v)] \UxV^UxV with Fu{u,v) G duO{u,v), F^{u,v) G dv[—0{u,v)] 

(such a field always is monotone), an execution protocol associated with {M,W) will be called also 
protocol associated with 9, U, V, or protocol associated with the saddle point problem 


min max9(u, v). 

udU v£V 

The importance of these notions in our context stems from the following simple observation [19]: 

Proposition 1. Let U, V be nonempty convex compact domains in Euclidean spaces U, V, 9{u,v) : 
17 X E —)• R 6e a convex-concave function, and M he induced monotone vector field: M{u,v) = 
[Mu{u,v)\ M^{u,vy\ : 17 X E —)• 17 X V with Mu{u,v) G du9{u,v), My{u,v) G dy[—9{u,v)]. For a 
t-step execution protocol It = {wt = [upvt] £W:=UxV,Mi = [My(ui,Vi)] My{ui,Vi)] : 1 < i < t} 
associated with 9, U, V, and t-step accuracy certificate X, it holds 

e^^{w\lt, A)|0, U, E) < Res(Xt, A|t7 x E). (7) 


Indeed, for [u; u] G 17 x E we have 


Res{It,X\U X E) > ZUi h{MuWi 


[u] u]) = Xi[{My{ui,Vi),Ui -u) - {My{ui,Vi),Vi - u)] 


>e{ui,vi)-e{u,vi) <9{ui,vi)-e{ui,v) 

> Y.\=i h[0{ui, v) - 9{u, Ui)] > 9{u\ v) - 9{u, u*), 

where the inequalities are due to the origin of M and convexity-concavity of 9. The resulting inequality 
holds true for all [u\v\ G 17 x E, and (7) follows. □ 


2.4.2 Main Result 


Proposition 2. In the situation and notation of sections 2.1 - 2.3, let he the primal convex-concave 
function induced by <I>, and let 


l-t = yi,i] G Ai X Yi,[ai := (xi,*, /3i := -(t)'y.,{xyi,yi,i)] : 1 < i < 1} 

he an execution protocol associated with fi, Xi, Yi, where are regular sub- and supergradient 

fields associated with fi. Due to the origin of there exist X 2 ,i G X 2 {xifi, Gi G X, 

y 2 ,i £ Y 2 [yifi, and Hi £ y such that 


(a) 

(b) 

(c) 

(d) 


G, 

Hi 

{Gi,x- [xiy,X 2 ,i]) 
{Hi,y- [yi,i;y 2 ,i\) 


G dx^{xi := [xiy,X2,i],yi := [yi,i-,y2,i]), 

G dy := [xi,i-,X2,i],yi := [yi,i;y2,i])] , 

> { 4 >xi {xi,i, yi,i),xi - xi,i)Vx = [xi; X2] G A, 

> -yi,i)Vy = [yi;y2] e Y, 


( 8 ) 
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implying that 


Jt = {zi = [xi = [xi^i\X2,i\\yi = [yi,uy2,i]]^Pi = [Gi\Hi] :l<i<t] 

is an execution protocol associated with X, Y. For every accuracy certificate A it holds 

Res{Jt, X\X xY)< Res{It, A|Xi x Yi). 

As a result, given an accuracy certificate A and setting 


(9) 


= [[x\]x\];[y\-,yi]] = ^ A* [[xi,*;0:2,*]; [yi,*;y 2 ,i]], 


2 = 1 


we ensure that 


whence also, by Lemma 1, 


< Res(Xt,A|Xi x Yi), 


es.d([xl;yl]|</-,Xi,yi) < Res(Xf, Aj^i x Fi), 

eU[x\-M\i’.X2,Y2) < Res(Xf,A|yi xRi), 


where ^|^ is the dual function induced by 

Proof. Let z := [[tii;u 2 ]; [ui;u 2 ]] G X xY. Then 


Z]i=l z) — Aj 


{Gi,[xix,x2,i]-[ui]U2]) + (-ffi, [yi,i;y2,i] - bi;t’2]) 


by (8-c) by (8.d) 

^ Y!i=i [(“*! “ "“i) + {/3i,yi,i - ^1)] < Res(Xt, A|Xi X Yi), 


( 10 ) 


( 11 ) 


and (9) follows. 


□ 


2.5 Application: Solving bilinear Saddle Point problems on domains represented 
by Linear Minimization Oracles 

2.5.1 Situation 

Let VL be a nonempty convex compact set in R^, Z be a nonempty convex compact set in R^, and 
let V' : IT X Z —)• R be bilinear convex-concave function: 

ipim, z) = {w,p) + {z, q) + {z, Sw). (12) 

Our goal is to solve the convex-concave SP problem 

min max "(/^(u;, z) (13) 

w^W z^Z 


given by ip, W, Z. 






2.5.2 Simple observation 

We intend to show that ijj can be represented (in fact, in many ways) as the dual function induced by 
a bilinear convex-concave function d*; this is the key element of the outlined in section 2.2 strategy for 
solving (13). 

In the situation described in section 2.5.1, let U C R"', V C R™ be convex compact sets, and let 
D S A e R E R™^^. Consider bilinear (and thus convex-concave) function 

‘k(u, w, V, z) = {w,p + D'^v) + {z,q + A^u) — {v, Ru) : [f7 x W] x [1/ x Z] — R (14) 

(the “convex” argument is {u,w), the “concave” one is {v,z)). Assume that a pair of functions 


u{w, z) : W X Z ^ U, 
v{w, z) :W X Z ^ V 


C O "j" 1 CTH 

y{w,z) ew X Z : Dw = Ru{w,z) 

'\/{w,z) X Z : Az = R'^v{w,z) 

Denoting u = u{w, z), v = v{w, z), we have 

(a) {w,D^v) = {Dw,v) = {Ru,v) 

(b) {z,A'^u) = {Az,u) = {u,R'^v) = {Ru,v). 

Thus, 

Vu^{u,w,v,z) = Az — R^v = 0 
Vv^{u,w]v, z) = Dw — Ru = 0 

whence 

'il:{w^z) := umiu^u ruayiy^v z) = ^{u{w, z)^w,v{w, z)^ z) 

= {w,p) + {z, q) -h {Dw, v{w, z)) [by (17)] 


We have proved 


Lemma 4. In the case o/(15), (16), assuming that 


(15) 


(16) 


(17) 


{Dw,v{w, z)) = {z,Sw) y{w e W,z € Z), 


(18) 


ijj is the dual convex-concave function induced by and the domains U x W, V x Z. 
Note that there are easy ways to ensure (16) and (18). 


Example 1. Here m = M, n = N, and D = = R = S. Assuming U D W, V D Z and setting 

u{w,z) = w, v{w,z) = z, we ensure (15), (16) and (18). 


Example 2. Let S = A^D with A E R^^^, D E Setting m = n = K, R = Ik, 

u{w,z) = Dw, v{w,z) = Az and assuming that U D DW, V D AZ, we again ensure (15), (16) and 
(18). 
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2.5.3 Implications 

Assume that (15), (16) and (18) take place. Renaming the variables according to xi = u, yi = v, 
X 2 ^ w, y 2 ^ z and setting Xi = U, X 2 = W, Yi = V, Y 2 = Z, X = Xi x X 2 = U x W, 
Y = Yi XY 2 = V X Z, we find ourselves in the direct product case of the situation of section 2.1, and 
Lemma 4 says that the bilinear SP problem of interest (12), (13) is the dual SP problem associated 
with the bilinear master SP problem 

min max \^{u,w]v, z) = {w^p + v)-\-{z^q + u) — {Ru^vy\ (19) 

[u;ui]gC/ y.W [v\z\eyy.Z 

Since is linear in the primal SP problem associated with (19) is 


min max <j){u^v) = mm.{w,p + v) + Tiiayi{v,q + u) — {Ru^v) . 

u=xi£^U=X\ v=y\^V=Yi w^W z^Z 

Assuming that W, Z allow for cheap Linear Minimization Oracles and defining u)*(-), 2;*(-) according 
to 

tc*(0 £ Argmin (rc,^), z^,{r\) G Argmin (z, ??), 

wGW zGZ 

we have 

4>(u,v) = {w^{p + D'^v),p + D'^v) + {z^:{-q - A^u),q + A^u) - {Ru,v), 

(j)'^lu,v) := Az^{—q — A^u)—R'^vedw(p{u,v), (20) 

(j)'^{u,v) := Dw^{p + D'^v) — Ru e—dy[—4>{u,v)], 

that is, hrst order information on the primal SP problem 


min max (/>(«, n), (21) 

u£U v£V 

is available. Note that since we are in the direct product case, (fj'u (py are regular sub- and 
supergradient fields associated with <I>, cp. 

Now let Zt = {[up, Vi] G U X R, := py{ui,Vi);Si := —p'y(ui,Vi)] : 1 < i < t} be an execution 
protocol generated by a First Order algorithm as applied to the primal SP problem (21), and let 

m = w,,{p + D'^Vi), Zi = z^{-q - A^Ui), 

Ui = Vw^{ui,Wi-, Vi, Zi) =p + D'^Vi, 

Pi = -Xz^iui, Wi] Vi, Zi) = -q- A^Ui, 


so that 


Jt = \ [[ui-, Wi ]; [vi ; Zi]], [ [7i; Oi] 


[SpPi] ]A<i<t 


^ [ii-u;] 1 > ’^ii ^i) ~ ^ [t; ■ z] ‘1 ^i) 


is an execution protocol associated with the SP problem (19). By Proposition 2, for any accuracy 
certificate A it holds 


Res(Ji, A|[/xkFxRxZ)< Res(Xi, X\U x V) 

whence, setting 

t 

[[u^;w^]-, [n*;z*]] = ^ Ai[[ui; u;*]; [vpzi]] 
i=l 


( 22 ) 

(23) 


10 




and invoking Proposition 1 with d* in the role of 6, 


e..d([[n*;n;*];[n*;z']]|^>,Xi x ^2,^1 x Y2) < Res(Xt,A|C/ x P) 

UxW VxZ 


(24) 


whence, by Lemma 1, 

eU[w^;A\^,W,Z) < Res{It,X\U x P). (25) 

We have arrived at the following 

Proposition 3. In the situation of section 2.5.1, let (15), (16) and (18) take place. Then applying 
to the primal SP problem (21) First Order algorithm B with accuracy certificates, t-step execution 
protocol Xt and accuracy certificate A* generated by B yield straightforwardly a feasible solution to the 
SP problem of interest (12) ~ (13) of the e„„^-inaccuracy < Kes{It, X^\U x P). 

Note also that when the constructions from Examples 1, 2 are used, there is a significant freedom 
in selecting the domain [7 x P of the primal problem (we only require U, V to be convex compact 
sets “large enough” to ensure the inclusions mentioned in Examples), so that there is no difficulty to 
enforce U, V to be proximal friendly. As a result, we can take as, B a, proximal First Order method, 
for example, Non-Euclidean Restricted Memory Level algorithm with certificates (cf. [4]) or Mirror 
Descent (cf. [17]). The efficiency estimates of these algorithms as given in [4, 17] imply that the 
resulting procedure for solving the SP of interest (12) - (13) admits non-asymptotic 0(l/\/t) rate of 
convergence, with explicitly computable factors hidden in O(-). The resulting complexity bound is 
completely similar to the one achievable with the machinery of Fenchel-type representations [4, 17]. 

We are about to consider a special case where the 0(l/\/t) complexity admits a significant im¬ 
provement. 


2.6 Matrix Game case 

Let S £ admit representation 

S = A^D 

with A G and D G Let also W = An = {w £ : Yli = 1}) ^ = ^M- Our goal is 

to solve matrix game 

min max z) = {z, Sw) = {Az, Dw)] . (26) 

wGW zgZ 

Let U, V be convex compact sets such that 

P D AZ, Ud dw, (27) 

and let us set 

^{u,w,v,z) = {u, Az) + {v, Dw) — {u,v) 

u := u{w, z) = Dw 
V := v{w, z) = Az 

implying that 

= Az — V = 0, 

= Dw — u = 0, 

= {u, Az) + {v, Dw) — {u, v) = {Dw, Az) + {Az, Dw) — {Dw, Az) 

= {z,A'^Dw)='fi{w,z). 


Vu^{u,w;v,z) 
V^<l>(u, w, V, z) 
<1>('U, W', V, z) 
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It is immediately seen that the function ip from (26) is nothing but the dual convex-concave function 
associated with (cf. Example 2), while the primal function is 

4i{u, v) = Max{A'^u) + Mm{D'^v) — {u, u); (28) 

here Min(p) and Max(p) stand for the smallest and the largest entries in vector p. Applying the 
strategy outlined in section 2.2, we can solve the problem of interest (26) applying to the primal SP 
problem 

minmax \4>{u, v) = Min(Z)^u) -|- Max(A^rt) — {u, u)l (29) 

an algorithm with accuracy certificates and using the machinery outlined in previous sections to 
convert the resulting execution protocols and certificates into approximate solutions to the problem 
of interest (26). 

We intend to consider a special case when the outlined approach allows to reduce a huge, but 
simple, matrix game (26) to a small SP problem (29) - so small that it can be solved to high accuracy 
by a cutting plane method (e.g., the Ellipsoid algorithm). This is the case when the matrices A, D in 
(26) are simple. 

2.6.1 Simple matrices 

Given a K x L matrix B, we call B simple if, given x € it is easy to identify the columns B[x], 
R[x] of B making the maximal, resp. the minimal, inner product with x. 

When matrices A, D in (26) are simple, the first order information for the cost function (p in 
the primal SP problem (29) is easy to get. Besides, all we need from the convex compact sets U, V 
participating in (29) is to be large enough to ensure that U D DW and V D AZ, which allows to make 
[/ and V simple, e.g., Euclidean balls. Einally, when the design dimension 2K of (29) is small, we have 
at our disposal a multitude of linearly converging, with the converging ratio depending solely on K, 
methods for solving (29), including the Ellipsoid algorithm with certificates presented in [19]. We are 
about to demonstrate that the outlined situation indeed takes place in some meaningful applications. 

2.6.2 Example: Knapsack generated matrices 

^ Assume that we are given knapsack data, namely, 

• positive integer horizon m, 

• nonnegative integer bounds pg, 1 < s < m, 

• positive integer costs hg, 1 < s < m, and positive integer budget H, and 

• output functions fg{-) : {0,1, ...,Ps} ^ 1 < s < m. 

Given the outlined data, consider the set V of all integer vectors p = [pi; ...;pm] G R"* satisfying the 
following restrictions: 

0 < Pg <Ps, 1 < s < m [range restriction] 

^sPs < H [budget restriction] 

and the matrix B of the size K x Card(P), K = dehned as follows: the columns of B are 

indexed by vectors p = [pi; ...;ps] G P, and the column indexed by p is the vector 

Bp = [fl{Pl)] ■■■; fm{Pm)]- 

^The construction to follow can be easily extended from “knapsack generated” matrices to more general “Dynamic 
Programming generated” ones, see section A.4 in Appendix. 
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Note that assuming m, Pg, r* moderate, matrix B is simple - given x G R^, it is easy to find B[x\ 
and 5[x] by Dynamic Programming. 

Indeed, to identify B[x\, x = [xi;...;xm] G x ... x (identification of R[x] is 
completely similar), it suffices to run for s = m,m —!,...! the backward Bellman recurrence 

Us{h) = niQyi{Us+i{h-hgr) + {fs{r),Xs) -.Q <r <Pg,Q <h-hgr] 
r£Z 

As{h) G Ajgm.aDL {Us+i{h-hgr) + {fs{r),Xs)<r <Pg,Q <h-hgr} 
rGZ 

with Um+i{-) = 0, and then to recover one by one the entries ps in the index p gV of B[x] 
from the forward Bellman recurrence 

= R,pi = Ai(Ri); 

Hg+i = Hg- hgPs,Ps+i = Ag+i{B:g+i), 1< s <m. 

2.6.3 Illustration: Attacker vs. Defender. 

The “covering story” we intend to consider is as follows^. Attacker and Defender are preparing for a 
conflict to take place on m battlefields. A pure strategy of Attacker is a vector o = [ai; ...;am], where 
nonnegative integer a^, 1 < s < m, is the number of attacking units to be created and deployed at 
battlefield s; the only restrictions on a, aside of nonnegativity and integrality, are the bounds a* < a^, 
1 < s < m, and the budget constraint hgACis < Ha with positive integer hgA and Ha- Similarly, 

a pure strategy of Defender is a vector d = [di; ...;dm]) where nonnegative integer dg is the number 
of defending units to be created and deployed at battlefield s, and the only restrictions on d, aside 
of nonnegativity and integrality, are the bounds ds < ds, 1 < s < m, and the budget constraint 
< Hd with positive integer hgo and Hr,. The total loss of Defender (the total gain of 
Attacker), the pure strategies of the players being a and d, is 

m 

s=l 

with given (a^ + 1) x {dg + 1) matrices Our goal is to solve in mixed strategies the matrix game 
where Defender seeks to minimize his total loss, and Attacker seeks to maximize it. 

Denoting by A and T> the sets of pure strategies of Attacker, resp., Defender, representing 

= r = [/^^;-;/4:], g^^ = [gK-,-;9l], r, = Rank(D^), 

i=l 

and setting 


we end up with K x M, M = Card(A), knapsack-generated matrix A with columns A^, a G A, and 
K X N, N = Card(D), knapsack-generated matrix D with columns Dd, d G H, such that 

5:= =A'^D. 

d£T, 

®This story is a variation of what is called “Colonel Blotto Game” in Game Theory, see, e.g., [2, 22] and references 
therein. 




r 1,2 2,2 

; b, ■ - ■ 


Ja2 1 Ja2 


[/< 




^ pV 

? Jdm 5 * *' 5 Jd 


^2 


.. ]]€R 


K 


Cl G 
d G T). 


K = ET=irs, 

Aa = 

n _ rr 1,1 2,1 ri,l 


,/i = 0,1,...,id, 
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As a result, solving the Attacker vs. Defender game in mixed strategies reduces to solving SP problem 
(26) with knapsack-generated (and thus simple) matrices A, D and thus can be reduced to convex- 
concave SP (29) on the product of two A'-dimensional convex compact sets. Note that in the situation 
in question the design dimension 2K of (29) will, typically, be rather small (few tens or at most few 
hundreds), while the design dimensions M, N of the matrix game of interest (26) can be huge. 

Numerical illustration. With the data (quite reasonable in terms of the “Attacker vs. Defender” 
game) 

m = 8, hsA = hsD = 1,1 < s < w-, Ha = Ho = 64 = 1 < s < m 

and rank 1 matrices 1 < s < m, the design dimensions of the problem of interest (26) are as huge 
as 

dim w = dim 2 = 97,082,021,465 
while the sizes of problem (29) are just 


dimtt = dimu = 8, 

and thus (29) can be easily solved to high accuracy by the Ellipsoid method. In the numerical 
experiment we are about to report'^, the outlined approach allowed to solve (26) within Csad-inaccuracy 
as small as 5.0e-9 in just 1537 steps of the Ellipsoid algorithm (110.0 sec on a medium quality laptop). 
This performance is quite promising, especially when taking into account huge - nearly 10^^ - sizes of 
the matrix game of interest (26). 

3 Prom Saddle Point problems to Variational Inequalities with Mono¬ 
tone Operators 

In what follows, we extend the decomposition approach (developed so far for convex-concave SP 
problems) to Variational Inequalities (Vi’s) with monotone operators, with the primary goal to handle 
VPs with affine monotone operators on LMO-represented domains. 

3.1 Decomposition of Variational Inequalities with monotone operators 

3.1.1 Situation 

Let A, H be Euclidean spaces, 0 C A x 77 be convex compact set, H be the projection of 0 onto A, 
and H be the projection of 0 onto H. Given ^ € E, r? S 77, we set 

H^ = {V - [^; V] e = {? G H ry] G 0}. 

We denote a point from A x 77 as 0 = [^; ry] with ^ € A, ry G 77. Let, further, 

d) = vY, ry)] : 0 ^ A X 77 

be a continuous monotone vector field. 

^for implementation details, see section A.5 
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3.1.2 Induced vector fields 


Let ^ G H, and let r/ = ri{^) be a somehow selected, as a function of G H, strong solution to the VI 
given by (iL^, ^>^(^, ?])), that is, 

fj{^) k -viO) > 0 Vr/ G (30) 

Let us call (more precisely, the pair ($, rl(-))) r]-regular, if for every ^ G H, there exists ^ = 'L(^) G V 
suich. th-ctt 

< imm), - [C;^(6])v[^';r?'] G 0. (31) 

Similarly, let ^{rj) be a somehow selected, as a function oi rj ^ H, strong solution to the VI given by 
(H^,$,,(C,r?)), that is, 

^(r/) eEr, k (^^(^(r?), r?), ^ - ^(r?)) > 0 G (32) 

Let us call (<h,^(-)) ^-regular, if for every ij H there exists L = T{r]) G Ti such that 

(r(7?), rj' -rj) < ??'] - [C{v);v]) V[^'; ??'] G 0. (33) 

When ($,??) is r/-regular, we refer to the above 'I'(-) as to a primal vector field induced by d> and 
when (d?,^) is ^-regular, we refer to the above r(-) as to a dual vector field induced by <I>. 


Example: Direct product case. This is the case where Q = E x H. In this situation, setting 
'I'(^) = <I>^(^, rl(^)), we have for [^',1]'] G ©• 






>0 


that is, {^,r]{-)) is ry-regular, with ^'(^) = Setting r(r/) = (^(ry),77), we get by similar 

argument 

v'] - [I; v]) > (r(??), ry' - v),v, v' g h, 

that is, (<I>,^(-)) is ^-regular, with r(Ty) = ‘I)^(^(ry), ry). 


3.1.3 Main Result, Variational Inequality case 

3.1.4 Preliminaries 

Recall that the (Minty’s) variational inequality VI(M, IV) associated with a convex compact subset 
W of Euclidean space W and a vector field M ; VL —)• W is 

find u; G IV : {M{w'),w' - rc) > 0 Vu;' G IV VI(M, IV) 

w satisfying the latter condition is called a wealc solution to the VI. A natural measure of inaccuracy 
for an approximate solution re G IV to VI(M, IV) is the dual gap function 

evi(tc|M, IV) = sup {M{w'),w — w')] 

w'GW 

weak solutions to the VI are exactly the points of IV where this (clearly nonnegative everywhere on 
IV) function is zero. 

In the sequel we utilize the following simple fact originating from [19]: 

® “a primal” instead of “the primal” reflects the fact that ^ is not uniquely defined by <1? - it is defined by $ and rj 
and by how the values of are selected when (31) does not specify these values uniquely. 
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Proposition 4. Let M he monotone on W, let It = {wt G W,M(wi) : 1 < i < t} be a t-step 
execution protocol associated with X be a t-step accuracy certificate, and 

the associated approximate solution. Then 

e^fiw^\M,W) < Res(Xt,A|tP). 


Indeed, we have 


Res(Xt, A|Vh) = sup^/gyi/ Xi{M{wi),Wi - te')] 

> sup^/giy [Z]i=i — tj;')] [since M is monotone] □ 

= sup^/g^y(M(t(;'),i(;* - w') = e^fi{w^\M,W). 


3.1.5 Main result 

Proposition 5. In the situation of section 3.1.1, let (<I>,r/(-)) he rj-regular. Then 

(i) Primal vector field 'I'(,^) induced by ($,r/(-)) is monotone on H. Moreover, whenever It = {.^i G 

H, 'l'(^j) : 1 < i < t} and Jt = {9i := ^{9i) : 1 < i < t} and X is a t-step accuracy certificate, 

it holds 

t 

eviiY, ^ ^ 1 ®) ^ ^ 1 ^)' 

i=l 

(ii) Let (4),^) be ^-regular, and let F be the induced dual vector field. Whenever 6 = [^;ry] G 0, we 
have 

evi(i/|r,/7) <evi(0|4>,0). (35) 


3.2 Implications 

In the situation of section 3.1.1, assume that for properly selected r/(-), ^(•), (4>,r/(-)) is r/-regular, 
and (<h,^(-)) is ^-regular, induced primal and dual vector fields being ^ and F. In order to solve the 
dual VI VI(F, H), we can apply to the primal VI VI('I', H) an algorithm with accuracy certihcates; by 
Proposition 5.i, resulting t-step execution protocol It = {^i, 4 /(^ 4 ) : 1 < i < t} and accuracy certificate 
A generate an execution protocol J't = {9i := [^j; 77 (^ 4 )], <I>(0i) : 1 < i < t} such that 

Res(Jt,A|0) < Res(Xt,A|H), 

whence, by Proposition 4, for the approximate solution 

T t 

9^ = ■= '^>'i9i = 

i=l i=l 

it holds 

evi(0‘|^,0) <Res(Xt,A|H). 

Invoking Proposition 5.ii, we conclude that ry* is a feasible solution to the dual VI VI(F,II), and 

eviir]^\T,H) < Res(Xt,A|H). (36) 

We are about to present two examples well suited for the just outlined approach. 
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3.2.1 Solving affine monotone VI on LMO-represented domain 

Let H he a. convex compact set vnTi. = R'^, and let H be equipped with an LMO. Assume that we 
want to solve the VI where 

F{r]) = Sr] + s 

is an affine monotone operator (so that S + S'^ F 0). Let us set X = 71, select H as a proximal-friendly 
convex compact set containing H, and set Q = F, x H, 




5^ 

-S^ 1 

[ e 1 


■ 0 

5 


. ^ . 

s 


■v' 

s 


We have 


S + S^ 


' S + S^ 





^0, 


so that is an affine monotone operator with 


= S^ + s 


Setting ^(ry) = ry, we ensure that ^(?y) G S when rj ^ H and d)^(^(?y), ?y) = 0, implying (32). Since we 
are in the direct product case, we can set r(ry) = (^(ry), ry) = Sr] + s = F{r])-, thus, VI(r,R) is our 

initial VI of interest. On the other hand, setting 


ry(^) G Argmin {S^ + s, rf), 

rjGH 

we ensure (30). Since we are in the direct product case, we can set 

note that the values of 'L can be straightforwardly computed via calls to the LMO representing H. We 
can now solve VI('L, H) by a proximal algorithm B with accuracy certificates and recover, as explained 
above, approximate solution to the VI of interest VI(F, iL). With the Non-Euclidean Restricted 
Memory Level method with certificates [4] or Mirror Descent with certificates (see, e.g., [17]), the 
approach results in non-asymptotical 0(l/\/t)-converging algorithm for solving the VI of interest, 
with explicitly computable factors hidden in 0(-)- This complexity bound, completely similar to the 
one obtained in [17], seems to be the best known under the circumstances. 


3.2.2 Solving skew-symmetric VI on LMO-represented domain 

Let H be an LMO-represented convex compact domain inT-L = R-^, and assume that we want to solve 
Yl{F,H), where 

F(ry) = 2Q^Pry + f 

with K X N matrices P, Q such that the matrix P is skew-symmetric: 

qTp p pTq ^ Q 
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Let Af = X R'^, let Hi, H 2 be two convex compact sets in R^ such that 


K 


QH cEi,-PH CE 2 . 


( 38 ) 


Let us set H = Hi x H 2 , and let 




Ik 

p 

HC = [Ci;C2],v) = 

-Ik 


Q 


-P^ 




Cl ' 


■ 0 ■ 

C 2 

+ 

0 

V 


. / . 


Note that is monotone and affine. Setting 


Civ) = [Qv, -Pv] 


and invoking (38), we ensure (32); since we are in the direct product case, we can take, as the dual 
induced vector field, 

nv) = <^v(Civ),v) = -P^iQv) - Q^i-Pv) + f = [Q'^P - P^Q]v + / ^ ‘^Q^Pv + f = Fiv), 

by (37) 

SO that the dual VI VI(r,R) is our VI of interest. 

On the other hand, setting 

viC = [?i;6]) € Argmin(/ - P'^^i - Q'^C2,v), 

7jeH 

we ensure (30). Since we are in the direct product case, we can define primal vector field as 

'^iC = [6;6]) = ‘^c([6;6])'^([6;6])) = 

Note that LMO for H allows to compute the values of 'L, and that H can be selected to be proximal- 
friendly. We can now solve V^'k, H) by a proximal algorithm B with accuracy certificates and recover, 
as explained above, approximate solution to the VI of interest VI(F, R). When the design dimension 
dimH of the primal VI is small, other choices of B, like the Ellipsoid algorithm, are possible, and 
in this case we can end up with linearly converging, with the converging ratio depending solely on 
dimH, algorithm for solving the VI of interest. We are about to give a related example, which can be 
considered as multi-player version of the “Attacker vs. Defender” game. 


6 + PviO 
-Cl + QviC) 


Example: Nash Equilibrium with pairwise interactions. Consider the situation as follows: 
there are 

• L > 2 players, ^-th of them selecting a mixed strategy from probabilistic simplex of 
dimension Ni, 

• encoding matrices Di of sizes x Ni, and loss matrices of sizes x such that 

= 0, <L. 
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• The loss of £-th player depends on mixed strategies of the players according to 

L 

Ci{ri := [wi-,...-,wl]) = E"' = dJ + {9i,v)- 

i'=i 

In other words, every pair of distinct players £, are playing matrix game with matrix , and 
the loss of player up to a linear in [rci; function, is the sum, over the pairwise games 

he is playing, of his losses in these games, the “coupling constraints” being expressed by the 
requirement that every player uses the same mixed strategy in all pairwise games he is playing. 
We have described convex Nash Equilibrium problem, meaning that for every £, is 

convex (in fact, linear) in W£, is jointly concave (in fact, linear) in := (tci,..., ..., ic^), 

and Yle=i^iiv) is the linear function {g^rj), g = 'Yhe.gii and thus is convex. It is known (see, e.g., 
[19]) that Nash Equilibria in convex Nash problem are exactly the weak solutions to the VI given by 
monotone operator 

F(r/ := [wi ]...; wl]) = ...; V^^/:l(? 7 )] 

on the domain 

H = Awi X ... X ■ 

Let us set 

... ‘ 

M2’2T)2 ... 

M^’^D2 ... 

.. ‘ 

so that Q^P is skew-symmetric due to Besides this, we clearly have 

F{rj := [rui; = 2Q^ Pg + f, f = [V^uiigi, [wv, ...;wl])-, ..■;V^,^{gL, [u^i; 


Q = 


Then 


■ Di 



D2 


, P = 


Dl _ 



Q^P = 




D'fM^>^D2 


Observe that if Di, are simple, so are Q and P. 


Indeed, for Q this is evident: to find the column of Q which makes the largest inner product 
with X = [xi; ...;xl], dimx^ = mi, it suffices to find, for every i, the column of Di which 
makes the maximal inner product with xi, and then to select the maximal of the resulting 
L inner products and the corresponding to this maximum column of Q. To maximize the 
inner product of the same x with columns of P, note that 


x^P = 




T 

Vl 


T 

Vl 


SO that to maximize the inner product of x and the columns of P means to find, for every 
i, the column of Di which makes the maximal inner product with yi, and then to select 
the maximal of the resulting L inner products and the corresponding to this maximum 
column of P. 


19 












We see that if Di are simple, we can use the approach from section 3.2.2 to approximate the solution 
to the VI generated by F on H. Note that in the case in question the dual gap function evi{r]\F, H) 
admits a transparent interpretation in terms of the Nash Equilibrium problem we are solving: for 
r/ = [wi; G H, we have 


L 

eyi{r]\F,H) > eNash(?/) := ^ 

e=i 




min Ci{wi, , 


(39) 


and the right hand side here is the sum, over the players, of the (nonnegative) incentives for a player i 
to deviate from his strategy wi to another mixed strategy when all other players stick to their strategies 
as given by ij. Thus, small evi([u)i; •) means small incentives for the players to deviate from 

mixed strategies Wi. 

Verification of (39) is immediate: denoting fi = V^^{gi,w), by definition of Cyi we have 
for every rj' = ...; w'j^] G H: 


evi(r/|F, H) > (E(r/'), r/ - r/') = - w'^) 

= Diiw[,,wi - w[) 

= - w'f) + Diiwi',Wi - w[) 

[since ^,{DJD£iZii, zi) = 0 due to 

= -w'^ = X)£[A(f?) - A(wi,...,W£-i,u;^,W£+i,...,u;l)] 

[since Li is affine in wi\ 


and (39) follows. 


3.3 Relation to [17] 

Here we demonstrate that the decomposition approach to solving VTs with monotone operators on 
LMO-represented domains cover the approach, based on Fenchel-type representations, developed in 
[17]. Specifically, let R be a compact convex set in Euclidean space Ti = R^, G(-) be a monotone 
vector field on H, and rj i—?> Ax + a be an affine mapping from T-L to Euclidean space X = R^. Given 
a convex compact set E C V, let us set 

e = ExH, cl>(e, v) = [<I>^(e, v) ■■= Ag + a; $,(C, r?) := G{g) - H^^] -.Q^AxTi, ( 40 ) 

so that <I> clearly is a monotone vector field on 0. Assume that r/(^) : H —)• is a somehow selected 

strong solution to VI(4)^(,f, 

VC G H : ^(C) eHk (Gim) - A^^, v - 77(C)) > 0 V77 G R; ( 41 ) 

'-V-' 

(cf. (30)); note that required f 7 (C) definitely exists, provided that G(-) is continuous and monotone. 
Let us also define as a selection of the point-to-set mapping rj 1 —)■ Argmin {Ar] -|- a,C), so that 

V77 G R : C(7) G E & (A?7 + a,^- C( 7 ?)) > 0 , VC G E ( 42 ) 

'-^-V-^' 
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(cf. (32)). 

Observe that with the just defined H, H, 0, <h, ^(•) we are in the direct product case of the 

situation described in section 3.1.1. Since we are in the direct product case, (d>,rj(-)) is ry-regular, and 
we can take, as the induced primal vector field associated with ($,r/(-)), the vector field 

= AfjiC) + a = $c(^,r?(0) : H ^ d:', (43) 

and as the induced dual vector field, the field 

r(r?) = 0(7?) - A^|(7?) = <^Mri),r,) 

Note that in terms of [17], relations (43) and (41), modulo notation, form what in the reference is 
called a Fenchel-type representation (F.-t.r.) of vector field ^ :'E ^ X, the data of the representation 
being T-L, A, a, ??(•), G'(-), H. On a closer inspection, every F.-t.r. of a given monotone vector field 
'k : S —^ d' can be obtained in this fashion from some setup of the form (40). 

Assume now that H is LMO-representable, and we have at our disposal G-oracle which, given on 
input 7? G F7, returns G{r]). This oracle combines with LMO for H to induce a procedure which, given on 
input 7? G F7, returns F(7?). As a result, we can apply the decomposition machinery presented in sections 
3.1, 3.2 to reduce solving VI('k, H) to processing VI(F, H) by an algorithm with accuracy certificates. 
It can be easily seen by inspection that this reduction recovers constructions and results presented in 
[17, sections 1 - 4]. The bottom line is that the developed in sections 3.1, 3.2 decomposition-based 
approach to solving Vi’s with monotone operators on LMO-represented domains essentially covers the 
developed in [17] approach based on Fenchel-type representations of monotone vector fields®. 
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A Appendix 

A.l Proof of Lemma 1 

It suffices to prove the (/>-related statements. Lipschitz continuity of 4> in the direct product case is 

evident. Further, the function 6 {xi,X 2 ;yi) = max 4>(xi,X 2 ;yi, 2 / 2 ) is convex and Lipschitz con- 

y 2 &Y 2 [yi] 

tinuous in X = [xi;x 2 \ £ X for every yi £ Yi, whence (f{xi,yi) = min 0 {xi,X 2 ',yi) is convex 

2 : 26 X 2 [xi] 

and lower semicontinuous in xi £ Xi (note that X is compact). On the other hand, (/>(xi,yi) = 


max min $(xi, X 2 ; yi, 2 / 2 ) = max 

J/26Y2[j/i] 2:26X2 [xi] y2&Y2[yi\ 

is concave and Lipschitz continuous in y = 


X{xi]yi,y 2 ) ■= min 4>(xi,X 2 ;yi,y 2 ) 

X2£X2[xi\ 

yi; y 2 ] £ Y for every xi £ Yi, whence 


, so thaty(xi;yi,y 2 ) 


4 >{xi,yi)= max Y(xi;yi,y 2 ) 

2 / 26 X 2 [yi| 

is concave and upper semicontinuous in yi £ Yi (note that Y is compact). 
Next, we have 


SadVal(((>,Yi,Y 2 ) = inf 

xiEXi 


sup 

2/16 Yi 


sup inf ^>(xi,X 2 ;yi,y 2 ) 

y2 • [yi jy2] X2-[xi,X2]G.X 


= inf 

a:i6Xi 

= inf 

xiGXi 

= inf 


sup inf ^>(xi,X 2 ;yi,y 2 ) 

[2/i;2/2]6Y^2:[3:i;X2]6X 


inf sup 4>(xi,X2;yi,y2) 

X2-.Ixi-,X 2]GX [yy^y2\& 

sup 4>(xi, X 2 ; yi, y 2 ) = SadVal(4>, Y, Y), 


[by Sion-Kakutani Theorem] 


[ 3 : 1 , 3 : 2 ]6X \yi-,y 2 ]GY 

as required in (2). Finally, let x = [xi;x 2 ] £ Y and y = [yi;y 2 ] £ Y. We have 

f{xi) - SadVal(((>, Yi, Yi) = ^(xi) - SadVal(4>, Y, Y) [by (2)] 

= sup ^(xi,yi) — SadVal(<h, Y,Y) 

2/1 eYi 

= sup sup inf <l)(xi, X 2 ; yi, y 2 ) — SadVal(<l>, Y, Y) 

2 / 16 Yl 2 / 2 :[ 2 /i; 2 / 2 ] 6 Y X 2 -.[xi;X 2 ]GX 

= sup inf $(xi,X 2 ;yi,y 2 ) - SadVal(<l>, Y, Y) 

[2/i;2/2]6Y:2:2:[3:i;3:2]6X 

= inf sup ‘h(xi, X 2 ; y) — SadVa^^h, Y, Y) 

3:2:[a:i;3:2]6X 

< sup <l>(xi, X 2 ; y) — SadVal(<l), Y, Y) 

2^[2/i;2/2]6Y 

= $(x)-SadVal(4>,Y,Y) 
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and 


SadVal((/), Xi, Yi) - ^(yi) 


= SadVal(cI>,X,y)-^(yi) [by (2)] 
= SadVal($,X,y) — inf (/)(a;i,yi) 

xieXi 


SadVal($,X,Y) - inf 

xiGXi 


inf sup '^{xi,X 2 -,yi,y 2 ) 

X2:[xi-X 2]GX y2:[yi]y2]GY 

= SadVa^^^, X, Y) — inf sup ^{x;yi,y 2 ) 

x=[xi;X2](zX y2-.['yi-,y2]£Y 

< SadVal(<J>,X, Y) — inf ^{x;yi,y 2 ) 

x=[xi;X2]eX 

= SadVal($,X,Y)-$(y). 


We conclude that 

^..d^uyiM,Xi,Yi) = [4>{xi) - SadVal(0,Xi, Yi)] + [SadVal((/>,Xi, Yi) -^(yi)] 

< [$(x)-SadVal(d>,X,Y)] +[SadVal(d>,X,Y)-t(y)] =e..a([x;y]|$,X,Y), 

as claimed in (3). □ 


A.2 Proof of Lemma 2 

For xi € Xi we have 

4 >{xi]yi)= min max ^(xi, X2; yi, y2) > min $(xi, X2; yi, y2) 

X2\x\\X 2\£X y 2 -\yr,y 2 \& X2-\xr,X2\&X 

> min [ <f>(x;y) +(G, [xi;x2] — [xi;x2]l) [since ‘h(x;y) is convex and G G dx^{x;y)] 

x 2 -\xr,x 2 \&X '-^ 

> 0 (xi; yi) + (y, xi - xi) [by definition of y, G], 

as claimed in (a). “Symmetric” reasoning justifies (b). □ 

A.3 Proof of Lemma 3 

Assume that ( 5 ) holds true. Then G clearly is certifying, implying that 

Xg(xi) = (G, [xi;x 2 ]), 

and therefore ( 5 ) reads 

(G, [xi;x2]) > Xg(xi) + {g,xi - xi) Vx = [xi;x2] G X, 
where, taking minimum in the left hand side over X2 G X2[xi], 

Xg(xi) > xg(xi) + (y, Xi - Xi) Vxi G Xi, 

as claimed in (ii). 

Now assume that (i) and (ii) hold true. By (i), xg'(xi) = (G, [xi;x2]), and by (ii) combined with 
the definition of xGj 

Vx = [xi;x2] G X : (G, [xi;x2]) > Xg(xi) > Xg(xi) + (y,xi - xi) = (G,x) + (y,xi - xi), 
implying ( 5 ). □ 


24 




A.4 Dynamic Programming generated simple matrices 

Consider the situation as follows. There exists an evolving in time system S, with state at time 
s = 1, 2, m belonging to a given finite nonempty set Further, every pair (^, s) with s G {1, m}, 
^ G is associated with nonempty finite set of actions A^, and we set 

5s = {(^,a) : ^ € ^s,o, € A|}. 

Further, for every s, 1 < s < m, a transition mapping TTs{^,a) : Sg is given. Finally, we are 

given vector-valued functions (” outputs ”) Xs ■ <Ss ^ R'”®. 

A trajectory of 5 is a sequence {(Cs, Os) : 1 < s < m] such that (.^s) Os) £ 5s for 1 < s < m and 

^s-Hi = 7rs{^s,as), 1< s <m. 

The output of a trajectory r = {(^sjOs) : 1 < s < rn} is the block-vector 


x[t] = am)]- 


We can associate with 5 the matrix D = D[S] with K = ri + ... + Vm rows and with columns indexed 
by the trajectories of 5; specifically, the column indexed by a trajectory r is x[t]. 

For example, knapsack generated matrix D associated with knapsack data from section 
2.6.2 is of the form I?[5] with system 5 as follows: 

• Hs, s = 1, ...,m, is the set of nonnegative integers which are < H; 

• is the set of nonnegative integers a such that a < pg and ^ — hgpg > 0; 

• the transition mappings are TTg(^,a) = ^ — ahg] 

• the outputs are Xs{C,a) = fg{a), 1 < s < m. 

In the notation from section 2.6.2, vectors [pi;...;pm] G V are exactly the sequences of 
actions oi,..., Um stemming from the trajectories of the just defined system 5. 


Observe that matrix D = D[S] is simple, provided the cardinalities of and are reasonable. 
Indeed, given x = [xi;...; Xm] G R" = x ... x R”"*, we can identify D[x] by Dynamic Programming, 
running first the backward Bellman recurrence 


m) 

MC) 


max{xfxs(?,a) + Ug+i{7rg{C,a))} 

Argmax {xjxsi^,a) -h C/^+i(7r^(C, a))} 
aeA| 


> ,^ eEg, s = m,m - 1 ,..., 1 


(where Um+i{-) = 0), and then identifying the (trajectory indexing the) column of D corresponding 
to D[x] by running the forward Bellman recurrence 


6 


G Argmax^gsj [/i(^) => oi G Ai(^i) => ... 
— "^siCs^ag) Us-i-i G ... 


1, 2,..., m — 1. 
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A.5 Attacker vs. Defender via Ellipsoid algorithm 


In our implementation, 

1. Relation (38) is ensured by specifying U, V as centered at the origin Euclidean balls of radius 
R, where R is an upper bound on the Euclidean norms of the columns in D and in A (such a 
bound can be easily obtained from the knapsack data specifying the matrices D, A). 

2. We processed the monotone vector field associated with the primal SP problem (29), that is, the 
field 


F{u, v) = [Fu{u, v) = A[u\ - V, Fy{u, v) = u - D[v]] 


by Ellipsoid algorithm with accuracy certificates from [19]. Eor r = 1,2,..., the algorithm 
generates search points [mtWt] G x with = 0, along with execution protocols 

= {[ui]Vi], F(ui,Vi) : i G Ir}-, where It = {i < t ■. [ui-,Vi] G U x V}, augmented by accuracy 
certificates = {A)" > 0 : i G R} such that K ~ From the results of [19] it follows 

that for every e > 0 it holds 


r > N{e) := 0{l)K‘^ln 


R + e 


^ Res{r, X^\U X V) <e. 


(44) 


3. When computing F{ui,Vi) (this computation takes place only at productive steps - those with 
[up, Vi] G U X V), we get, as a byproduct, the columns A = A[ui] and D* = l^Vi] of matrices A, 
D, along with the indexes a*, d* of these columns (recall that these indexes are pure strategies 
of Attacker and Defender and thus, according to the construction of A, D, are collections of m 
nonnegative integers). In our implementation, we stored these columns, same as their indexes 
and the corresponding search points [ttj;uj]. As is immediately seen, in the case in question the 
approximate solution [w'^',z'^] to the SP problem of interest (26) induced by execution protocol 
and accuracy certificate is comprised of two sparse vectors 


w = 


E 

i&Ir 


z = 




i&Ir 


(45) 


where <5^ is the “d-th basic orth” in the simplex A^v of probabilistic vectors with entries in¬ 
dexed by pure strategies of Defender, and similarly for <5^. Thus, we have no difficulties with 
representing our approximate solutions^, in spite of their huge ambient dimension. 

According to our general theory and (44), the number of steps needed to get an e-solution [w,z\ to 
the problem of interest (i.e., a feasible solution with z\\iIj,W, Z) < e) does not exceed N{e), 

with computational effort per step dominated by the necessity to identify A[rtj], D\vi] by Dynamic 
Programming. 

In fact, we used the outlined scheme with two straightforward modifications. 

• Eirst, instead of building the accuracy certificates X^ according to the rules from [19], we used 
the best, given execution protocols X'^, accuracy certificates by solving the convex program 


min < Res(X'^, A|17 x V) := max^ Xi{F{ui, Vi), [up Uj] - y) : A* > 0, ^ A* = 1 > (46) 

^ I i&Ir i&Ir ) 


^Note that applying Caratheodory theorem, we could further “compress” the representations of approximate solutions 
make these solutions convex combinations of at most K + 1 oi S^^i’s and (5,p’s. 
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In our implementation, this problem was solved from time to time, specifically, once per 
steps. Note that with U, V being Euclidean balls, (46) is well within the scope of Matlab Convex 
Programming solver CVX [12]. 

• Second, given current approximate solution (45) to the problem of interest, we can compute its 
saddle point inaccuracy exactly instead of upper-bounding it by Res(X'^, A'^|17 x V). Indeed, it 
is immediately seen that 

e..d(K; W, Z) = Max(A^[^ - Min(Z)^[^ Al^'j). 

In our implementation, we performed this computation each time when a new accuracy certificate 
was computed, and terminated the solution process when the saddle point inaccuracy became 
less than a given threshold (l.e-4). 

A.6 Proof of Proposition 5 

(i): Let ^ 1,^2 G S, and let tji = ^( 6 ), m = ^( 6 )- By (31) we have 

{^2), ^2 - ^1) > (^(6)^2), [6 - 6;^2 - ?/i]) 

(' f '( 6),6 - 6 ) > 

Summing inequalities up, we get 

(d^(6) - ^(6)) 6 - 6) > {^{^2,'112) - ^(6)^1)) [6 - 6; ^2 - m]) > o, 

so that 'h is monotone. 

Further, the first inequality in (34) is due to Proposition 4. To prove the second inequality in (34), 
let It = {Ci G S,^(?i) ■ '^ < i < t}, Jt = {9i := ??(^i)], 4>(6'i) : 1 < i < t}, and let A be t-step 

accuracy certificate. We have 

9 = [^-,7]] e e ^ 

EUi 9i-9)< A('h(Ci), - 0 [see (31)] 

< Res(Xt, AJS) 

=> Res( Jt, A]©) = sup 0 =[^.^]g 0 Y!i=i h{^{9i), 9i - 9) < Res{It, AJS). 

(i) is proved. 

(ii): Let rj G H. Invoking (33), we have 

(r(f?),f? - f?) < (d>(l(??),r/), [^;i7] - [l{v);v]) < evi(0|$,0), 


and (35) follows. 


□ 
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